Search CORE

INRIA a CCSD electronic archive server

Multivariate Analysis and Visualization of Splicing Correlations in Single-Gene Transcriptomes

BACKGROUND: RNA metabolism, through 'combinatorial splicing', can generate enormous structural diversity in the proteome. Alternative domains may interact, however, with unpredictable phenotypic consequences, necessitating integrated RNA-level regulation of molecular composition. Splicing correlations within transcripts of single genes provide valuable clues to functional relationships among molecular domains as well as genomic targets for higher-order splicing regulation. RESULTS: We present tools to visualize complex splicing patterns in full-length cDNA libraries. Developmental changes in pair-wise correlations are presented vectorially in 'clock plots' and linkage grids. Higher-order correlations are assessed statistically through Monte Carlo analysis of a log-linear model with an empirical-Bayes estimate of the true probabilities of observed and unobserved splice forms. Log-linear coefficients are visualized in a 'spliceprint,' a signature of splice correlations in the transcriptome. We present two novel metrics: the linkage change index, which measures the directional change in pair-wise correlation with tissue differentiation, and the accuracy index, a very simple goodness-of-fit metric that is more sensitive than the integrated squared error when applied to sparsely populated tables, and unlike chi-square, does not diverge at low variance. Considerable attention is given to sparse contingency tables, which are inherent to single-gene libraries. CONCLUSION: Patterns of splicing correlations are revealed, which span a broad range of interaction order and change in development. The methods have a broad scope of applicability, beyond the single gene – including, for example, multiple gene interactions in the complete transcriptome

Collection Of Biostatistics Research Archive

A comprehensive RNA-Seq-based gene expression atlas of the summer squash (Cucurbita pepo) provides insights into fruit morphology and ripening mechanisms

Author: A García
A Maghiaoui
A Obrero
A Snouffer
A Vitiello
AR Sede
C Esteras
D Kim
D-H Kim
E Eisenberg
H Miao
HS Paris
I Mellidou
I Mellidou
JF Sánchez-Sevilla
JM Claverie
K Haga
K McKown
M Kanehisa
M Pertea
MD Robinson
P Marowa
R Yano
T Pomares-Viciana
TK Mohanta
Y Sitrit
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Background: Summer squash (Cucurbita pepo: Cucurbitaceae) are a popular horticultural crop for which there is insufficient genomic and transcriptomic information. Gene expression atlases are crucial for the identification of genes expressed in different tissues at various plant developmental stages. Here, we present the first comprehensive gene expression atlas for a summer squash cultivar, including transcripts obtained from seeds, shoots, leaf stem, young and developed leaves, male and female flowers, fruits of seven developmental stages, as well as primary and lateral roots. Results: In total, 27,868 genes and 2352 novel transcripts were annotated from these 16 tissues, with over 18,000 genes common to all tissue groups. Of these, 3812 were identified as housekeeping genes, half of which assigned to known gene ontologies. Flowers, seeds, and young fruits had the largest number of specific genes, whilst intermediate-age fruits the fewest. There also were genes that were differentially expressed in the various tissues, the male flower being the tissue with the most differentially expressed genes in pair-wise comparisons with the remaining tissues, and the leaf stem the least. The largest expression change during fruit development was early on, from female flower to fruit two days after pollination. A weighted correlation network analysis performed on the global gene expression dataset assigned 25,413 genes to 24 coexpression groups, and some of these groups exhibited strong tissue specificity. Conclusions: These findings enrich our understanding about the transcriptomic events associated with summer squash development and ripening. This comprehensive gene expression atlas is expected not only to provide a global view of gene expression patterns in all major tissues in C. pepo but to also serve as a valuable resource for functional genomics and gene discovery in Cucurbitaceae

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura

RiuNet

Statistical significance of cis-regulatory modules

Author: A Kel
A Klingenhoff
A Sandelin
A Sosinsky
A Wagner
A Wagner
A Wagner
A Webber
AA Philippakis
Andrew D Smith
AP Lifanov
BP Berman
BP Berman
D GuhaThakurta
DS Johnson
Dustin E Schones
E Eskin
EM McCreight
F Tronche
G Hertz
GD Stormo
J van Helden
JM Claverie
JM Claverie
JS Liu
K Struhl
M Beckstette
M Beckstette
M Blanchette
M Gupta
MA Beer
MC Frith
MC Frith
Michael Q Zhang
N Munshi
N Nagarajan
N Rajewsky
O Johansson
P Leighton
Q Zhou
R Hoberman
R Hoberman
R Staden
RR Sokal
S Aerts
S Rahmann
S Sinha
TD Schneider
TL Bailey
TL Bailey
TL Baily
V Matys
W Kent
W Thompson
WB Alkema
WW Wasserman
YH Grad
Z Xuan
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: It is becoming increasingly important for researchers to be able to scan through large genomic regions for transcription factor binding sites or clusters of binding sites forming cis-regulatory modules. Correspondingly, there has been a push to develop algorithms for the rapid detection and assessment of cis-regulatory modules. While various algorithms for this purpose have been introduced, most are not well suited for rapid, genome scale scanning. RESULTS: We introduce methods designed for the detection and statistical evaluation of cis-regulatory modules, modeled as either clusters of individual binding sites or as combinations of sites with constrained organization. In order to determine the statistical significance of module sites, we first need a method to determine the statistical significance of single transcription factor binding site matches. We introduce a straightforward method of estimating the statistical significance of single site matches using a database of known promoters to produce data structures that can be used to estimate p-values for binding site matches. We next introduce a technique to calculate the statistical significance of the arrangement of binding sites within a module using a max-gap model. If the module scanned for has defined organizational parameters, the probability of the module is corrected to account for organizational constraints. The statistical significance of single site matches and the architecture of sites within the module can be combined to provide an overall estimation of statistical significance of cis-regulatory module sites. CONCLUSION: The methods introduced in this paper allow for the detection and statistical evaluation of single transcription factor binding sites and cis-regulatory modules. The features described are implemented in the Search Tool for Occurrences of Regulatory Motifs (STORM) and MODSTORM software

Cold Spring Harbor Laboratory Institutional Repository

arXiv.org e-Print Archive

The quest for the solar g modes

Author: A Antoniou
A Boury
A Claverie
A Claverie
A Gautschy
A Gautschy
A Grigahcène
A Jiménez
A Jiménez
A Kolmogorov
A Moya
A Noels
A Thoul
A Vecchio
A Zaatri
A. Gabriel
A. Jiménez
A. Kosovichev
A.-M. Broomhall
AB Severnyi
AG Kosovichev
AG Kosovichev
AG Kosovichev
AG Kosovichev
AG Kosovichev
AG Polnarev
AH Gabriel
AH Gabriel
AH Gabriel
AM Broomhall
AN Cox
AN Cox
AS Brun
AS Brun
B Dintrans
B Dintrans
B. N. Andersen
BL Sawford
BN Andersen
BN Andersen
C Aerts
C Eckart
C Fröhlich
C Fröhlich
C Fröhlich
C Fröhlich
C Waelkens
C. Fröhlich
CE Shannon
CG Toner
CL Wolff
CR Proffitt
CS Rosenthal
D Salabert
D. O. Gough
DB Guenther
DB Guenther
DE Winget
DG Wentzel
DGT Denison
DJ Thomson
DJ Thomson
DO Gough
DO Gough
DO Gough
DO Gough
DO Gough
DO Gough
DO Gough
DO Gough
E Böhm-Vitense
E Caffau
E Caffau
E Fossat
E Schatzman
E Schatzman
EA Spiegel
EG Adelberger
EM Green
F Baudin
F Kupka
F Varadi
F Varadi
F. Baudin
FL Deubner
FL Deubner
FWJ Olver
FWW Dilke
G Batchelor
G Berthomieu
G Berthomieu
G Berthomieu
G Giamperi
G Grec
G Houdek
G Houdek
G. Grec
G. Houdek
GR Caughlan
GW Hoogeveen
H Ando
H Jeffreys
H Nyquist
H Saio
H Shibahashi
HA Hill
HA Hill
HM Antia
J Ballot
J Ballot
J Berger
J Berger
J Christensen-Dalsgaard
J Christensen-Dalsgaard
J Christensen-Dalsgaard
J Christensen-Dalsgaard
J Christensen-Dalsgaard
J Christensen-Dalsgaard
J Christensen-Dalsgaard
J Christensen-Dalsgaard
J Christensen-Dalsgaard
J Christensen-Dalsgaard
J Christensen-Dalsgaard
J Dyson
J Eisenfeld
J Montalbán
J Montalbán
J Provost
J Provost
J Provost
J. Provost
JA Guzik
JD Scargle
JL Auré
JM Burgers
JM Wersinger
JN Bahcall
JN Bahcall
JP Cox
JP Poyet
JP Zahn
JR Elliott
JR Oleson
JRI Gott
JW Harvey
K Belkacem
K Belkacem
K Belkacem
K. Belkacem
L Bertello
L Bertello
L Damé
L Gizon
L Koopmans
M Asplund
M Castro
M Dikpati
M Gabriel
M Gabriel
M Gabriel
M Gabriel
M Takata
M Tassoul
M Woodard
MA Dupret
MH Pinsonneault
MJ Lighthill
MJ Thompson
MN Rosenbluth
MS Miesch
N Grevesse
N Mordant
NH Baker
NJ Balmforth
NJ Balmforth
O Andreassen
O Richard
P Delache
P Garaud
P Goldreich
P Goldreich
P Goldreich
P Goldreich
P Goldreich
P Goldreich
P Kumar
P Ledoux
P Ledoux
P Ledoux
P Morel
P Morel
P. Boumier
PA Sturrock
PH Scherrer
PH Scherrer
PR Goode
R Burston
R Samadi
R Samadi
R Samadi
R Samadi
R Samadi
R Samadi
R Scuflaire
R Stein
R. A. García
RA García
RA García
RA García
RA García
RA García
RB Leighton
RC Willson
RF Stein
RF Stein
RH Kraichnan
RW Komm
S Aigrain
S Basu
S Basu
S Couvidat
S Frandsen
S Mathis
S Mathis
S Mathur
S Mathur
S Talon
S Talon
S Talon
S Turck-Chièze
S Turck-Chièze
S Turck-Chièze
S Turck-Chièze
S Turck-Chièze
S. Turck-Chièze
T Appourchaux
T Appourchaux
T Appourchaux
T Appourchaux
T Appourchaux
T Appourchaux
T Appourchaux
T Appourchaux
T Appourchaux
T Appourchaux
T Appourchaux
T Corbard
T Corbard
T Sellke
T Stahn
T Toutain
T Toutain
T Toutain
T Toutain
T Toutain
T. Appourchaux
T. Sekii
T. Toutain
TG Cowling
TI Rashba
TI Rashba
TL Duvall Jr
TL Duvall Jr
TM Brown
TM Brown
TM Brown
TM Rogers
TM Rogers
V Domingo
VA Baturin
VM Canuto
W Dziembowski
W Dziembowski
W Dziembowski
W Finsterle
W Unno
W Unno
W Unno
W. Finsterle
W. J. Chaplin
WA Dziembowski
WA Dziembowski
WA Dziembowski
WA Dziembowski
WH Press
WH Press
WJ Chaplin
WJ Chaplin
WJ Chaplin
WM Yang
WT Ni
Y Lebreton
Y Osaki
Y Osaki
Y. Elsworth
YV Vandakurov
ZE Musielak
Å Pamyatnykh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/10/2009
Field of study

Solar gravity modes (or g modes) -- oscillations of the solar interior for which buoyancy acts as the restoring force -- have the potential to provide unprecedented inference on the structure and dynamics of the solar core, inference that is not possible with the well observed acoustic modes (or p modes). The high amplitude of the g-mode eigenfunctions in the core and the evanesence of the modes in the convection zone make the modes particularly sensitive to the physical and dynamical conditions in the core. Owing to the existence of the convection zone, the g modes have very low amplitudes at photospheric levels, which makes the modes extremely hard to detect. In this paper, we review the current state of play regarding attempts to detect g modes. We review the theory of g modes, including theoretical estimation of the g-mode frequencies, amplitudes and damping rates. Then we go on to discuss the techniques that have been used to try to detect g modes. We review results in the literature, and finish by looking to the future, and the potential advances that can be made -- from both data and data-analysis perspectives -- to give unambiguous detections of individual g modes. The review ends by concluding that, at the time of writing, there is indeed a consensus amongst the authors that there is currently no undisputed detection of solar g modes.Comment: 71 pages, 18 figures, accepted by Astronomy and Astrophysics Revie

University of Birmingham Research Portal

HAL-UNICE

HAL-INSU

Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set

Author: BFJ Manly
CI Castillo-Davis
David Johnson
DB Searls
DB Searls
DD Womble
E Badidi
F Antequera
J Krueger
J Theilhaber
JD Wren
JD Wren
JF Costello
JM Claverie
Jonathan D Wren
JR Quinlan
K Davies
K Nakai
L Stein
Le Gruenwald
LV Zhang
M Ashburner
M Gardiner-Garden
M Safran
P Clark
RS Michalski
S Foissac
S Muggleton
SP Shah
TV Venkatesh
V Bajic
W Frawley
WM Shui
WM Shui
Y Liu
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

There is an enormous amount of information encoded in each genome – enough to create living, responsive and adaptive organisms. Raw sequence data alone is not enough to understand function, mechanisms or interactions. Changes in a single base pair can lead to disease, such as sickle-cell anemia, while some large megabase deletions have no apparent phenotypic effect. Genomic features are varied in their data types and annotation of these features is spread across multiple databases. Herein, we develop a method to automate exploration of genomes by iteratively exploring sequence data for correlations and building upon them. First, to integrate and compare different annotation sources, a sequence matrix (SM) is developed to contain position-dependant information. Second, a classification tree is developed for matrix row types, specifying how each data type is to be treated with respect to other data types for analysis purposes. Third, correlative analyses are developed to analyze features of each matrix row in terms of the other rows, guided by the classification tree as to which analyses are appropriate. A prototype was developed and successful in detecting coinciding genomic features among genes, exons, repetitive elements and CpG islands

Assessment of clusters of transcription factor binding sites in relationship to human promoter, CpG islands and gene expression

Author: A Wagner
AE Kel
B Lenhard
B Shea
BP Berman
DA Papatsenko
DS Prestridge
DS Prestridge
F Larsen
GD Stormo
GG Loots
JA Warrington
JM Claverie
K Quandt
KD Pruitt
L Ponger
LL Hsiao
M Gardiner-Garden
MC Frith
MC Frith
MI Arnone
MS Halfon
N Rajewsky
O Johansson
R Ihaka
RR Sokal
S Aerts
S Hannenhalli
S Levy
S Levy
TD Schneider
V Matys
V Solovyev
W Krivan
WH Press
WJ Ewens
WJ Kent
WJ Kent
WW Wasserman
Y Suzuki
Y Suzuki
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: Gene expression is regulated mainly by transcription factors (TFs) that interact with regulatory cis-elements on DNA sequences. To identify functional regulatory elements, computer searching can predict TF binding sites (TFBS) using position weight matrices (PWMs) that represent positional base frequencies of collected experimentally determined TFBS. A disadvantage of this approach is the large output of results for genomic DNA. One strategy to identify genuine TFBS is to utilize local concentrations of predicted TFBS. It is unclear whether there is a general tendency for TFBS to cluster at promoter regions, although this is the case for certain TFBS. Also unclear is the identification of TFs that have TFBS concentrated in promoters and to what level this occurs. This study hopes to answer some of these questions. RESULTS: We developed the cluster score measure to evaluate the correlation between predicted TFBS clusters and promoter sequences for each PWM. Non-promoter sequences were used as a control. Using the cluster score, we identified a PWM group called PWM-PCP, in which TFBS clusters positively correlate with promoters, and another PWM group called PWM-NCP, in which TFBS clusters negatively correlate with promoters. The PWM-PCP group comprises 47% of the 199 vertebrate PWMs, while the PWM-NCP group occupied 11 percent. After reducing the effect of CpG islands (CGI) against the clusters using partial correlation coefficients among three properties (promoter, CGI and predicted TFBS cluster), we identified two PWM groups including those strongly correlated with CGI and those not correlated with CGI. CONCLUSION: Not all PWMs predict TFBS correlated with human promoter sequences. Two main PWM groups were identified: (1) those that show TFBS clustered in promoters associated with CGI, and (2) those that show TFBS clustered in promoters independent of CGI. Assessment of PWM matches will allow more positive interpretation of TFBS in regulatory regions

Public Library of Science (PLOS)

Lethal Mutants and Truncated Selection Together Solve a Paradox of the Origin of Life

Author: A Kun
A Wolff
AS Kondrashov
C de Duve
Chin-Kun Hu
Christof K. Biebricher
CO Ofria
CO Wilke
D Bonnaz
David B. Saakian
DB Saakian
DB Saakian
DB Saakian
DB Saakian
DB Saakian
DB Saakian
DB Saakian
DB Saakian
DB Saakian
DC Krakauer
E Baake
E Munoz
EV Nimwegen
H Tejero
HE Stanley
HG Schuster
I Budin
J Hermisson
J Summers
J Zorn
JJ Bull
JJ Bull
JM Claverie
JM Park
Joseph Najbauer
JP Schrum
JR Peck
K Sato
K Zahnle
M Eigen
M Eigen
MA Huynen
MW Powner
N Ichihashi
N Takeuchi
N Takeuchi
O Schueler-Furman
R Gil
R Sanjuan
R Sanjuan
S Kauffman
S Rajamani
SS Mansy
T Inoue
TA Lincoln
UJ Meierhenrich
VS Pande
WK Johnston
Z Avetisyan
Z Kirakosyan
Z Kirakosyan
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

BACKGROUND: Many attempts have been made to describe the origin of life, one of which is Eigen's cycle of autocatalytic reactions [Eigen M (1971) Naturwissenschaften 58, 465-523], in which primordial life molecules are replicated with limited accuracy through autocatalytic reactions. For successful evolution, the information carrier (either RNA or DNA or their precursor) must be transmitted to the next generation with a minimal number of misprints. In Eigen's theory, the maximum chain length that could be maintained is restricted to 100-1000 nucleotides, while for the most primitive genome the length is around 7000-20,000. This is the famous error catastrophe paradox. How to solve this puzzle is an interesting and important problem in the theory of the origin of life. METHODOLOGY/PRINCIPAL FINDINGS: We use methods of statistical physics to solve this paradox by carefully analyzing the implications of neutral and lethal mutants, and truncated selection (i.e., when fitness is zero after a certain Hamming distance from the master sequence) for the critical chain length. While neutral mutants play an important role in evolution, they do not provide a solution to the paradox. We have found that lethal mutants and truncated selection together can solve the error catastrophe paradox. There is a principal difference between prebiotic molecule self-replication and proto-cell self-replication stages in the origin of life. CONCLUSIONS/SIGNIFICANCE: We have applied methods of statistical physics to make an important breakthrough in the molecular theory of the origin of life. Our results will inspire further studies on the molecular theory of the origin of life and biological evolution

Public Library of Science (PLOS)

MPG.PuRe

microPIR: An Integrated Database of MicroRNA Target Sites within Human Promoter Sequences

Author: A Grimson
A Siepel
AFA Smit
BA Janowski
BM Engels
BP Lewis
C Zhang
Chaiwat Bootchai
Chumpol Ngamphiw
D Karolchik
DH Kim
DP Bartel
DP Bartel
E Blanco
F Xiao
G Ruvkun
H Dweep
I. King Jordan
IH Consortium
J Kruger
J Piriyapongsa
JC Carrington
Jittima Piriyapongsa
JM Claverie
LC Li
LD Stein
M Ashburner
M Blanchette
M Hafner
M Hirakawa
M Kanehisa
M Khorshid
PA Fujita
Q Jiang
RF Place
RH Waterston
S Griffiths-Jones
S Nam
S Volinia
SD Hsu
Sissades Tongsima
ST Sherry
ST Younger
ST Younger
V Ambros
V Ambros
W Filipowicz
X Wang
Y Huang
Publication venue: Public Library of Science
Publication date: 16/03/2012
Field of study

Background: microRNAs are generally understood to regulate gene expression through binding to target sequences within 39-UTRs of mRNAs. Therefore, computational prediction of target sites is usually restricted to these gene regions. Recent experimental studies though have suggested that microRNAs may alternatively modulate gene expression by interacting with promoters. A database of potential microRNA target sites in promoters would stimulate research in this field leading to more understanding of complex microRNA regulatory mechanism. Methodology: We developed a database hosting predicted microRNA target sites located within human promoter sequences and their associated genomic features, called microPIR (microRNA-Promoter Interaction Resource). microRNA seed sequences were used to identify perfect complementary matching sequences in the human promoters and the potential target sites were predicted using the RNAhybrid program..15 million target sites were identified which are located within 5000 bp upstream of all human genes, on both sense and antisense strands. The experimentally confirmed argonaute (AGO) binding sites and EST expression data including the sequence conservation across vertebrate species of each predicted target are presented for researchers to appraise the quality of predicted target sites. The microPIR database integrates various annotated genomic sequence databases, e.g. repetitive elements, transcription factor binding sites, CpG islands, and SNPs, offering users the facility to extensively explore relationships among target sites and other genomi

CiteSeerX

Defining Life: The Virus Viewpoint

Author: A Lwoff
A Lwoff
B Scola La
B Scola La
C Bandea
CA Suttle
CR Woese
CR Woese
D Prangishvili
D Raoult
D Raoult
DH Bamford
DH Bamford
E Schrödinger
ES Miller
EV Koonin
F Engels
H Pearson
J Brosius
J Filée
J Filée
J Filée
J Sapp
J Sapp
JG Bragg
JM Claverie
LP Villarreal
LP Villarreal
M Häring
M Jalasvuori
M Krupovic
M Suzan-Monti
M Takemura
ML Baker
N Parseval De
NR Pace
O Lecompte
P Forterre
P Forterre
P Forterre
P Forterre
P Forterre
P Forterre
P Forterre
P Forterre
P Forterre
Patrick Forterre
PJ Bell
RA Edwards
RF Ryan
RR Novoa
S Miller
S Prudhomme
Publication venue: Springer Netherlands
Publication date: 01/01/2010
Field of study

Are viruses alive? Until very recently, answering this question was often negative and viruses were not considered in discussions on the origin and definition of life. This situation is rapidly changing, following several discoveries that have modified our vision of viruses. It has been recognized that viruses have played (and still play) a major innovative role in the evolution of cellular organisms. New definitions of viruses have been proposed and their position in the universal tree of life is actively discussed. Viruses are no more confused with their virions, but can be viewed as complex living entities that transform the infected cell into a novel organism—the virus—producing virions. I suggest here to define life (an historical process) as the mode of existence of ribosome encoding organisms (cells) and capsid encoding organisms (viruses) and their ancestors. I propose to define an organism as an ensemble of integrated organs (molecular or cellular) producing individuals evolving through natural selection. The origin of life on our planet would correspond to the establishment of the first organism corresponding to this definition